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1. Introduction 


The goal of this research project is the automatic synthesis of signal understanding systems in time* 
critical environments. In particular, we are designing and implementing a working prototype 
computer system (called CHI) capable of automatic, knowledge based synthesis of harmonic set 
formation programs. CHI will accept a description of the desired program in a very high level 
language (and possibly other languages such as English, examples, or a special mathematical 
notation) and produce an efficient implementation of the program in a target language such as LISP. 
Progress has been made in determining appropriate ways to specify the application program, 
understanding the different set partitioning heuristics and algorithms that exist for doing the 
harmonic sec formation, tracing the necessary steps which CHI must take to transform the program 
specification into a target program, and designing the language in which CHI will operate. 


This report covers progress on the following tasks: completing a near-term demonstration capability 
of knowledge based programming using the existing PSI program synthesis system; specifying the 
harmonic set formation programs to be used as a target application; designing the CHI system for 
writing these (and other) programs; developing an in-house computing facility in support of this 
research; and disseminating results via technical publications. 


The knowledge based programming research effort under this contract was initiated on 27 
November 1978. However, the contrac, with its requirement of quarterly technical reports, was not 
signed until 23 March 1979. Since this was after the end of the first quarter of the contract, this 
report covers the first two quarters of the project. 


As a result of our discussions with Bob Engelmore of DARPA and Marv Denicoff and Cordon 
Goldstein of ONR, reporting procedures have been clarified. In future quarters, the quarterly 
technical report, which is primarily a technical progress report, will be incorporated into the 
quarterly research and development status report. Details of substantial technical results will 
continue to be presented in special interim technical reports. 
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2. Task l: Near-Term Demonstration Capability 


We plan to continue working on the existing PSI program synthesis system to provide a 
demonstration capability, to study the strengths and weaknesses of the system, and for use by the 
dissertations of Elaine Kant, Brian McCune, Lou Steinberg, and Dick Gabriel. Some of the actual 
code and many of the ideas of the old system will be applied to CHI. 


2.1 PSI System Maintenance 

Since joining the knowledge based programming group, Beverly Kedzierski's primary goal has been 
the integration of new PSI subsystems into a working system. The acquisition phase of PSI, 
primarily the programs by Jerry Ginsparg, Lou Steinberg, and Dick Gabriel, was dealt with first. 
The parser has been brought up and is currently running independently. Maintaining and possibly 
enhancing the parser/interpreter is a continuing task that may eventually result in replacement of the 
parser by some other currently available one. The dialogue moderator of Steinberg has been 
integrated and compiled. Final progress toward a working acquisition phase awaits modification of 
the explainer by Gabriel. The next step will then be to integrate the program model builder and 
synthesis phase to obtain a complete running version of PSL 


2.2 Explainer 

Dick Gabriel is working on the explanation system, which produces an English explanation of the 
internal representation of a program acquired by PSI. 

The part that generates English is now working and has been tested extensively. It repeatedly 
makes transformations on an initial paragraph in order to improve its style and clarity. The 
problem is to find an appropriate "expert” within the explainer to make such an improvement 
Major progress has been made on the planning module, which is responsible for finding such an 
expert This is done using two criteria' (1) how well the expert’s knowledge applies to the current 
text and <2) how well the expert will promote a non-repetitious prose style. 

Routines for planning the initial paragraph structure of a description of a simple program have 
been written and are being tested with the improved English generator. It is expected that this first 
program will be explained within two months. 


24 Efficiency Expert 

Elaine Kant has extended the efficiency module to handle the decisions involved in a news retrieval 
program, several variants of a classification program, and insertion and selection sorts. This work 
has included maintaining, and in some cases extending, the coding module and the interface between 
the program modelling language and the synthesis phase. The facilities for recording control and 
data flow in the programs being synthesized have been improved. The process of acquiring time 
estimates for the coding construes used in the synthesis phase has been partially automated. We 
expea this work to be extendible to CHI and partitioning algorithms. 








3. Task 2: Specification of Harmonic Set Formation Programs 
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The major milestone for the first year of this contract is to specify the ultimate goal, namely, 
reprogramming a harmonic set formation program. The milestone includes defining the target 
program itself, defining the inputs to CHI that specify the target program, and analyzing the 
knowledge necessary to create it. This harmonic set formation program determines harmonic 
relationships between acoustic signals collected from an ocean environment. Harmonically related 
signals are manifestations of a single source (e.g., a ship's engine, pump, etc.) which appear at evenly 
spaced frequency intervals when transformed into a frequency versus time display. Our attempts in 
the past several months have included developing a specification of the tasks involved in finding 
harmonically related signals and in denning the rules of behavior of those signals. In addition, 
efforts have been made to define sequences of refinements which could be made to the initial 
program specification to move it closer to machine executable code Several different refinement 
paths have been considered. The problem specification and refinement path are being used to help 
develop an overall system design. 


3.1 Input Specification 

Harmonic set formation involves considering a set of lints (acoustic signals manifest in time) and 
grouping those lines into subsets of harmonically related lines (those emanating from the same 
source). Reasonably well-defined rules have been developed to determine when lines are 
harmonically related. More generally, the problem is one of looking at all partitions (harmonically 
related groupings) of a set of lines and choosing one which generates the minimum cost (the cost 
inversely indicates the “goodness” of the harmonic relations). The cost function is multifaceted and 
includes not only the local cost of each group in the partition (eg, is it a well-formed harmonic set), 
but also the overall likelihood of this type of a partition (e^., it is unlikely that each line would be in 
a separate harmonic set). The factors which have been considered in defining the goodness of a 
particular harmonic set include 

Arc the frequencies of the lines in the set integer multiples of a fundamental frequency? 

Do the lines originate in approximately the same location (area of the ocean)? 

Do lines in the set manifest in a reversable harmonic pattern (e.g„ first harmonic, second 
harmonic)? 

Do lines in the set tend to be leu intense at higher frequencies? 

Bob Drazovich has developed a set of these rules, written fairly concisely and precisely in English. 
An equivalent mathematical notation has been defined which may be used as an alternate input 
specification language or as an internal representation for the English rules. 
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3.2 Analysis of Programming Knowledge Required 

Steve Westfold has been considering the programming knowledge necessary to write harmonic set 
formation programs. Starting with the initial specification of the harmonic set formation task, efforts 
have been made to develop a sequence of transformations which move toward machine executable 
code. Activities have centered on identifying a method for implementing the specification which 
finds the minimum cost (as described above) partition of a set of lines (acoustic signals). Among the 
approaches considered were these: 

(1) Explicitly enumerate and evaluate the cost of each possible partition of the set of 
lines. This approach is easy to develop but impractical because of the targe amount of 
computation involved. Typical real world situations often involve the analysis of large 
numbers of lines at a time Explicit enumeration of all possible partitions is 
prohibitively expensive computationally. 

(2) Start with an initial solution and move towards a better (lower cost) partition by 
some form of hill climbing. Initial solutions which were considered started with each 
line in a separate group (a singleton partition) and all lines in one group. In the case of 
a singleton partition, lines were then associated (combined) on the basis of rules of 
harmonic behavior. When starting with all lines in a single group, the group was 
evaluated for harmonic consistency, and inconsistent lines were removed (to another 
group) until the harmonic behavior rules were satisfied. 

(3) Use a divide and conquer approach by using one feature dimension at a time to 
guide the partitioning:. The goal was to divide the overall set formation task into 
subtasks by pregrouping the input lines. For example, lines could be grouped by 
location. The more manageable subtasks (smaller sets of lines) could then be analyzed 
separately using one or more of the other techniques specified in this section. 

(4) Use an operations research approach. This is an attempt to define the harmonic set 
formation task as an integer programming problem. The task would be refined into 
maximizing (or minimizing) a cost function (about the goodness of a particular 
partition) while satisfying a set of constraints about the behavior of lines in a specific 
set. 

In addition to considering refinements of the partitioning specification, attempts have been made to 
refine the rules of harmonic behavior. As may have been noticed above, the rules often denote 
general trends (e.g, harmonics gentreUly become less intense at higher frequencies) which must be 
specified in a more quantitative fashion for machine evaluation. However, exceptions to single rules 
do occur in this environment and often should not by themselves veto a potential harmonic 
relationship. The two refinement approaches which have been considered either specify each rule 
such that only a true or false is returned and specify each rule such that a weight (either positive of 
negative) is returned. In this case, the final decision may be the (possibly weighted) sum of Che 
individual rules. 

The partitioning syntheses considered by Steve Tappet to date deal with exhaustive partitioning, i.e., 
generating all partitions of a set. This is a simpler problem than heuristic partitioning. Even here, 
however, it has become apparent that, without strong domain support in the form of knowledge 
about properties of partitions and schemata of partitioning algorithms, the transformations are too 
difficult for an automatic system to handle. 











5 


4. Task 3s Design of CHI 


Two major problems exhibited by state-of-the-art program synthesis systems are, first, the increasing 
difficulty of maintaining an evolving knowledge base of rules about programming as it increases in 
size and, second, the need for restricting the use of resources during the synthesis process in order to 
code problems of a practical size. 

Jorge Phillips has developed an integrated language to specify both very high level programs and 
knowledge about programming in the form of rules. This language (called V, for Very high level 
language) will be used as the central framework for the CHI system. It attempts to provide a 
practical solution to the problems outlined in the preceding paragraph. The major goal is to 
facilitate algorithm development by providing a uniform basis for the specification of modular 
systems with arbitrary binding and accessing mechanisms. In addition, V provides a vehicle for the 
acquisition and modification of a knowledge base of rules about programming and of metarules that 
guide the application of these rules during the synthesis process. 

Two major research efforts have been launched in the pursuit of V's design goals: the design of the 
program description component of the language, of which an example is given below, and the design 
of the rule and metarule components of the language. V provides a very rich vocabulary of 
programming concepts which can be used to reflect either program building actions or knowledge 
base modification and enhancement operations. c ’ 


4 * Specification Language 


At the program description level, the major problem is to provide V constructs and semantics which 
allow easy coding of high level algorithm specifications into languages with very different binding 
and dynamic regimes (e.g, LISP. ALCOL, and FORTRAN). To achieve this effect, V allow! 
explicit specification of variable binding and parameter passing mechanisms. 

The following V program description exhibits the basic features of the language. The algorithm 
repeatedly finds all news stories in a database which match the keyword which is input. 

nodule NEUS 

v«r OB s relation 
STORY, 

KEYWORD; 

INPUT i alternative 
KEYUORD, 

ESCAPE; 

begin 

input CB; 
loop 

input INPUT; 

exit if type?(INPUT) - ESCAPE; 
output inverseJ«age(C8, INPUT); 

end; 

endt 
end NEUS; 
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The following recursive algorithm finds all partitions of a set. At each level of recursion, it chooses 
an element of the set to partition and merges it with ail possible partitions of the original set minus 
the removed element. This technique corresponds to divide and conquer using a singleton split. 

module P 

type PARTITION t set of set: "Powerset of an unspecified set" 

procedure entry PARTITIONS (S : set) : set of PARTITION: 
i f S ■ PHI then PHI 

el seif size(S) - 1 then IS! 
else select x in S do 

( u C U 1 IxUz) U (y- iz))} ] ) U {({*}) Uy) 

y«PARTITICNS(S-(xH zty 

end P: 


4.2 Rule Language 

At the metalevel, a language for expressing programming and efficiency knowledge has been defined 
by Jorge Phillips and Elaine Kant. The language is such that it allows complete expressibility of 
actions at the meta and synthesis levels. The basic entities dealt with in the language are program 
description nodes, rules (at all abstraction levels), history sequences of node transformations and rule 
applications, and data flow in the program description. 

Metarules are greatly simplified in this design by extensive use of prototypes of metalevel entities 
that describe their structure and properties. These prototypes allow for declarative specification of 
constraints on properties, as well as explicit and implicit properties. The latter are object properties 
which, instead of being an explicit part of the object, have an explicit method or procedure for 
computing their values. The metalanguage has been tested by coding a small number of rules used 
in PSI’s synthesis phase in the new formalism. 

Previous experience with PSI by Kant has pointed out the usefulness of features such as prototypes 
and data flow constructs that were not included in the original design of PSI. She has translated a 
number of the efficiency rules into the new language as a test. 


4.3 Codification of Programming Knowledge 

Other aspects of this research are concerned with the use of the V foundation in a preliminary 
testbed. Programming knowledge is currently being defined and extracted for the specification and 
codification of algorithms in the partitioning, combinatorial, and symbolic manipulation domains. A 
prototype system that embeds these ideas is currently being built in INTERLISP. 


I 
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5. Task 4: Computer Support 


This task is managed by Brian McCune. 


5.1 F2 Computer 

DARPA funding for the Fooniy F2 computer was provided in response to our Three-Year 
Computing Addendum to Proposal MPL 79-044, Research on Knowledge Based Programming. The 
F2, which emulates a PDP-10, will be installed during June. 


5.1.1 Software 

The F2 will run the latest public-domain version (1.34) of the TENEX operating system. All 
standard TENEX software will be supported, including LISP, SAIL, PASCAL, FORTRAN, text 
editors, document compilers, ARPAnet software, and systems support software. Of particular 
importance to this contract is the availability of INTERLISP, the EMACS display editor, and the 
PUB document compiler for programming and document preparation. 


5.1.2 Peripherals 

Peripherals were selected for the F2 with the goal of utilizing state-of-the-art technology at 
reasonable cost. The disk drive is an Ampex DM 9300 CD, which features 312 megabytes of 
unformatted capacity (compared to the largest DEC drive of 200 megabytes), a data rate of 1.2 
megabytes per second, and the industry standard “storage module" interface. The tape drive is a 
Telex 6250, which features 9-track recording, a speed of 125 inches per second (compared to the 
more typical 45 or 75), recording densities of 1600 and $250 bytes per inch (compared to today’s 
minicomputer standard of BOO and 1600), maximum data rate of 781 kilobytes per second, automatic 
loading and unloading of capes, and a chassis which mounts in only half of a standard 19 inch 
hardware cabinet. The printer/plocter is a Versatec 3200A, which features resolution of 200 points 
per inch, printing or plotting at 500 lines of text per minute, paper 11 inches wide, and desktop size. 
The modems, built by Universal Data Systems, have data rates of either 300 baud (full duplex) or 
150/1200 baud (i.e n full duplex with 150 baud from the display terminal's keyboard and 1200 baud 
to its screen) and are allowed to directly connect to the telephone network. 


5.14 Terminal System 

As an interim display terminal, we are renting Datamedia 3025s, the standard interactive text editing 
terminal used by the ARPAnet community in the Stanford area. Currently we are considering the 
ZSO-based Ann Arbor 4080 COMP AT with a microprocessor-based Microswitch keyboard and 
Products Associates 150/1200 baud modem. This terminal would feature the cop keyboard in the 
industry (long in use at Stanford, MIT, and CMU), both ASCII and Stanford ASCII compatibility, 
Datamedia 2500 and 3025 compatibility, a 15 inch screen, display of 40 lines of text, and an integral 
modem. 
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Eventually we want high resolution bitmap terminals in house. The current standard is the Grinnell 
display system with an array of 1024 x 1024 bits, coupled with Microswitch keyboards. Before 
choosing this route, we are waiting to evaluate new progress in the area of personal computers that 
would provide distributed text editing and perhaps LISP, in addition to a high resolution display. 
Candidates that should be available within the next year are from Xerox, Three Rivers Computer, 
Image Automation, and MIT. 


5.1.4 ARPAnet Connection 

Shortly after the F2 is installed. Error Correcting Units (ECUs) from Associated Computer 
Consultants (ACC) will be installed at SCI and Xerox PARC in order to connect the F2 to the 
ARPAnet as a local host on the XEROX IMP. The F2 will be a limited server with host name 
SCMCS. SRI is purchasing the ECUs for DARPA, and they will be connected by a 56 kilobaud 
digital telephone line McCune assisted m the planning coordination for this equipment between 
DARPA. SCI. PARC, ACC, BBN, PTScT, and SRI. 


5.1.5 Packet Radio Network Link 

We are adding a second BBN 1822 interface to SCMCS, a Distant Host interface to a packet -U; 
connected to the Bay Area Packet Radio Network. SCMCS will provide a testbed as the first sc 
host on the PRnet. DARPA will supply and install the radio and TENEX software, including TCP 
and server and user protocol handlers. 


5.2 In-House TIP or IMP 

In anticipation of being provided a TIP or IMP by DARPA, we are installing enough telephone 
lines to handle many 50 kilobaud wideband circuits and TIP ports, as well as direct dialups to SCI- 
ICS and other, future computers. As discussed in Section IV of the Three-Year Computing 
Addendum, SCI already has a need for high-speed terminal access to the ARPAnet independent of 
the F2. Work on many DARPA contracts (listed in Appendix A, “Breadth of SCI’s ARPAnet 
Access Requirements") is slowed down by the fact that only 300 and 1200 baud dialup access to 
remote TIPs is available. In the future a second F2 will likely be procured and put on the 
ARPAnet Using multiple ECU links to remote IMPs is an uneconomic way to provide network 
access for more than one host An in-house TIP would solve both of these problems. 


5.3 Computing Lab 

SCI is building a computing laboratory separate from its current computer facilities, primarily for 
Defense Division computer science and signal processing research, including knowledge based 
programming, algorithm creation, distributed sensor nets, speech, vision, radar, and sonar. This 
laboratory will include the SCMCS F2, the ECU linking it to the ARPAnet, the packet radio 
linking it to the PRnet, probably a link to the SCI corporate VAX, a PDP 11/35 realtime signal 
processing frontend to the F2, an LSI-11 De Anza color graphics system, a Grinnell 8-bit grayscale 
graphics system, two Tektronix storage tube graphics displays, and probably a second F2. SCI is 
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6. Talk 5 j Technical Publications 


6.1 Publications on PSZ 

With the editorial efforts of Kedzierski, a paper on PSI [Green et ai.-79] was accepted for 
presentation at the Sixth International Joint Conference on Artificial Intelligence (1JCAI-79), to be 
held in Tokyo, Japan, this August. This paper summarizes recent progress on and present 
capabilities of PSI. 

A paper on knowledge for synthesizing sorting programs, by Cordell Creen and Dave Barstow 1 , was 
published in Artificiai Intelligence [Creen Sc 3arstow.7Sl A revised version of Barstow’s thesis on 
the PSI coder will appear in book form [Barstow-791 

Elaine Kant’s thesis on the PSI efficiency expert [Kant-79A] is nearly completed. A paper based on 
this work [Kant-79B] has been accepted for presentation at IJCAI-79. 

Brian McCune expects to have a draft of his thesis on the program model builder [McCune-79] 
completed in August 1979. 


6.2 Publications on CHI 

A paper by Cordell Creen and Brian McCune outlining the goals of our current research [Creen 8c 
McCune-78] was presented at the Distributed Sensor Nets Workshop held at Carnegie-Mellon 
University in December 1978. An expanded version of this paper, including a discussion of other 
possible applications [Green 8c McCune-791 was given at the Technical Workshop on the 
Application of Artificial Intelligence and Spatial Processing to Radar Signals for Automatic Ship 
Classification, held in New Orleans in February 1979. 

A draft of Jorge Phillips' thesis on the design of CHI [Phillips-79] should be available in July 1979. 


now Assistant Professor of Computer Science at Yale University 
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Appendix A. Breadth of SCFs ARPAnet Access Requirements 


Below is a list of SCl’s current contracts and pending proposals with OARPA that use or will use 
the ARPAnet, listing the contract name, principal investigator, and ARPAnet sites used, by OARPA 
office. 


IPTO 

“Knowledge Based Programming", Cordell Green, USC-ISIC, USC-ISIE, SU-AI, and SU-SCORE 

“Algorithm Creation by Intelligent Systems", Cordell Green, USC-ISIC, USC-ISIE, SU-AI, and SU- 
SCORE (proposal pending) 

“Radar Target Classification”, Cordell Green and Roland Payne, TENEX systems (proposal pending 
to NAVELEX) 


TTO 

"Surveillance Integration Automation Project", Robert Draiovich, I4-TENEX, MOFFETT-ARC, 
USC-ISI, and SU-AI 

“Surgical Countermeasures”, A. J. Rockmore, USC-ECL 

STO 

“Performance Evaluation of Image Registration”, Hassan Mostafavi, USC-ECL 

“Cruise Missile Path Optimization”, Jim Marsh, USC-ECL (proposal pending) 

"Almost Terminal Viewing, Synthetic Aperture Radar", Fred Smith, USC-ECL (proposal pending) 

“Experimental Definition for Spaceborne Distributed Aperture Radar Concepts", Hugh Pearce, 
USC-ECL (proposal pending) 
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